Survival Analysis and KM Plotting

Survival Analysis and Kaplan-Meier plotting functions

Analysis functions

df_counting

Prepares a comprehensive dataset for survival analysis using the counting process approach. Computes risk sets, event counts, Kaplan-Meier estimates, log-rank (Fleming-Harrington (\(\rho,\gamma\)) weights) and Cox model results, and quantile estimates for two groups (e.g., treatment and control), optionally handling (subject-specific case) weights (such as inverse probability weights) and stratification.

Time-dependent weights for log-rank statistics are implemented via wt.rg.S function: Supporting a variety of commonly used and custom weighting schemes for weighted log-rank and related tests. The function is flexible and can be used for Fleming-Harrington, Schemper, Xu & O’Quigley (XO), Maggir-Burman (MB), custom time-based, and exponential variants of Fleming-Harrington weights.

Key Features: - Calculates weights for use in weighted log-rank and related survival tests. - Supports standard Fleming-Harrington weights (scheme = "fh"). - Implements Schemper weights (scheme = "schemper"), XO weights (scheme = "XO"), MB weights (scheme = "MB"). - Allows for custom time-based weights (scheme = "custom_time"). - Supports exponential variants of Fleming-Harrington weights (scheme = "fh_exp1", scheme = "fh_exp2").

Typical Inputs: - S: Kaplan-Meier survival curve (vector). - scheme: Weighting scheme (e.g., "fh", "schemper", "XO", "MB", "custom_time", "fh_exp1", "fh_exp2"). - rho, gamma: Weighting parameters (for "fh"). - Scensor: (Optional) Censoring distribution (for "schemper"). - Ybar: (Optional) Risk set sizes (for "XO"). - tpoints, t.tau, w0.tau, w1.tau: (Optional) For custom time-based weights. - mb_tstar: (Optional) For MB weights. - details: (Optional) Logical; whether to return detailed output.

Typical Outputs: - A vector of weights corresponding to the input time points (or a list with details if details = TRUE).

Main Steps: 1. Determines the weighting scheme based on the scheme argument. 2. Computes weights using the chosen formula and input survival/censoring curves. 3. Handles special cases for Schemper, XO, MB, custom time-based, and exponential FH weights. 4. Returns the computed weight vector (or details if requested).

plot_weighted_km

Creates Kaplan-Meier survival plots for two groups (e.g., treatment vs. control), allowing for the use of weights (such as inverse probability weights) in the estimation. The function can display survival curves, confidence intervals, risk tables, and optionally subgroup curves or additional annotations.

KM_diff

KM_diff compares Kaplan-Meier survival curves between two groups (typically treatment vs. control) using time-to-event data. It calculates survival estimates, their differences, and provides both pointwise and simultaneous confidence intervals. The function can also perform resampling to construct simultaneous confidence bands.

Allows for arbitrary weights (non-negative) to implement (for example) propensity-score adjustment (IPW).

plotKM.band_subgroups

Plots the difference (via KM_diff calculations) in Kaplan-Meier survival curves between two groups (e.g., treatment vs. control), optionally including simultaneous confidence bands and subgroup curves. The function also displays risk tables for the overall population and specified subgroups.

wlr_dhat_estimates

Computes the weighted log-rank test statistic, its variance, the difference in survival between two groups at a specified time point (tzero), the variance of this difference, their covariance, and correlation. It supports flexible time-dependent weighting schemes via the wt.rg.S function.

library(tinytex)
library(survival)
# to install weightedKMplots
# library(devtools)
# github_install("larry-leon/weightedKMplots")
library(weightedKMplots)
# source revised functions (to be updated in weightedKMplots)
source("~/Library/CloudStorage/OneDrive-MerckSharp&DohmeLLC/documents/GitHub/weightedKMplots/R/df_counting_improved.R")
source("~/Library/CloudStorage/OneDrive-MerckSharp&DohmeLLC/documents/GitHub/weightedKMplots/R/km_calculations_helpers.R")
source("~/Library/CloudStorage/OneDrive-MerckSharp&DohmeLLC/documents/GitHub/weightedKMplots/R/kmplotting_helpers.R")
#survminer is only used for 1 plot comparison
library(survminer)

—- Data Preparation —- Prepare GBSG data

df_gbsg <- gbsg
df_gbsg$time_months <- df_gbsg$rfstime / 30.4375
tte.name <- "time_months"
event.name <- "status"
treat.name <- "hormon"
arms <- c("treat", "control")

—- Main Analyses —-

GBSG - ITT Analysis

dfcount_gbsg <- df_counting(df=df_gbsg, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms, by.risk = 12, scheme = "fh", lr.digits = 4, cox.digits =3)

—- Plotting —- GBSG KM plots

par(mfrow=c(1,2))
# Mine
# Set ymax a little above 1.0 to allow for logrank placement in topleft
# topleft is default; xmed.fraction = 0.65 positions the display of the median estimates below the HR estimates ("topright" legend [default])
# For details, please see summary of xmed.fraction in the Appendix below.
plot_weighted_km(dfcount=dfcount_gbsg, conf.int=FALSE, show.logrank = TRUE, put.legend.lr = "topleft", ymax = 1.05, xmed.fraction = 0.65)
title(main="GBSG (trial) data un-weighted")
# Compare with survfit
plot_km(df=df_gbsg, tte.name=tte.name, event.name=event.name, treat.name=treat.name)
title(main="GBSG (trial) data un-weighted
      via survfit")

Compare with survminer

plot_weighted_km(dfcount=dfcount_gbsg, conf.int=TRUE, show.logrank = TRUE, ymax = 1.05)
title(main="GBSG (trial) data un-weighted")

km_fit <- survfit(Surv(time_months,status) ~ hormon, data=df_gbsg)
ggsurvplot(km_fit,conf.int=TRUE, risk.table = TRUE, break.time.by=12, xlim=c(0,86),
           tables.height = 0.2, tables.theme = theme_cleantable(), censor=TRUE, main = "GBSG (trial) data un-weighted")

—- Propensity Score Weighting (Rotterdam) —-

Prepare Rotterdam data

gbsg_validate <- within(rotterdam, {
  rfstime <- ifelse(recur == 1, rtime, dtime)
  t_months <- rfstime / 30.4375
  time_months <- t_months
  status <- pmax(recur, death)
  ignore <- (recur == 0 & death == 1 & rtime < dtime)
  status2 <- ifelse(recur == 1 | ignore, recur, death)
  rfstime2 <- ifelse(recur == 1 | ignore, rtime, dtime)
  time_months2 <- rfstime2 / 30.4375
  grade3 <- ifelse(grade == "3", 1, 0)
  treat <- hormon
  id <- as.numeric(1:nrow(rotterdam))
  SG0 <- ifelse(er <= 0, 0, 1)
})

Node positive only to correspond to GBSG population

df_rotterdam <- subset(gbsg_validate, nodes > 0)

Propensity score model

fit.ps <- glm(treat ~ age + meno + size + grade3 + nodes + pgr + chemo + er, data=df_rotterdam, family="binomial")
pihat <- fit.ps$fitted
pihat.null <- glm(treat ~ 1, family="binomial", data=df_rotterdam)$fitted
wt.1 <- pihat.null / pihat
wt.0 <- (1 - pihat.null) / (1 - pihat)
df_rotterdam$sw.weights <- ifelse(df_rotterdam$treat == 1, wt.1, wt.0)
# truncated weights
df_rotterdam$sw.weights_trunc <- with(df_rotterdam, pmin(pmax(sw.weights, quantile(sw.weights, 0.05)), quantile(sw.weights, 0.95)))

Un-weighted and weighted analyses

dfcount_rotterdam_unwtd <- get_dfcounting(df=df_rotterdam, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms, by.risk=24)

dfcount_rotterdam_wtd <- get_dfcounting(df=df_rotterdam, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms, by.risk=24, 
                                  weight.name="sw.weights")

—- Plotting Weighted vs Unweighted —-

par(mfrow=c(1,2))
plot_weighted_km(dfcount=dfcount_rotterdam_unwtd, xmed.fraction = 0.65)
title(main="Rotterdam un-weighted K-M curves")
plot_weighted_km(dfcount=dfcount_rotterdam_wtd, xmed.fraction = 0.65)
title(main="Rotterdam weighted K-M curves")

Compare with GBSG trial data

par(mfrow=c(1,2))
plot_weighted_km(dfcount=dfcount_gbsg, conf.int = TRUE, xmed.fraction = 0.65, show.logrank = TRUE, ymax = 1.05)
title(main="GBSG (trial) data un-weighted K-M curves")
plot_weighted_km(dfcount=dfcount_rotterdam_wtd, conf.int = TRUE, xmed.fraction = 0.65, show.logrank = TRUE, ymax = 1.05)
title(main="Rotterdam weighted K-M curves")

Simultaneous bands and point-wise CIs

Note that the first graph displays 20 (1st 20) resampled approximations to the centered survival difference

\(\Delta(t) = (\widehat{S_1} - \widehat{S_0})(t) - (S_{1} - S_{0})(t)),\) for timepoints \(t\) within “qtau” quartiles of the event times (i.e., for qtau = 2.5% we calculate \(\Delta\) for time points within the 2.5% and 97.5% quantiles).

—- Simultaneous CI bands —-

par(mfrow=c(1,2))
  temp <- plotKM.band_subgroups(
    df=df_gbsg,
    xlabel="Months", ylabel="Survival difference",
    yseq_length=5, cex_Yaxis=0.7, risk_cex=0.7,
    tau_add=42, by.risk=6, risk_delta=0.075, risk.pad=0.03,
    tte.name = "time_months", treat.name = "hormon", event.name = "status", draws.band = 5000, qtau = 0.025, show_resamples = TRUE
  )
  legend("topleft", c("Difference estimate", "95% pointwise CIs"),
         lwd=c(2,1), col=c("black"), lty=c(1,2), bty="n", cex=0.75)
  title(main="ITT population")

Subgroup differences along with ITT simultaneous band:

Note that er=0 subgroup was identified as questionable benefit.

—- Subgroup Band Plot —-

par(mfrow=c(1,1))
sg_labs <- c("er == 0","er > 0", "meno == 0", "meno ==1")
sg_cols <- c("blue", "brown", "green", "turquoise")

# Randomly sample colors from the built-in palette
# n_sg <- length(sg_labs)
# set.seed(123) # for reproducibility
# sg_cols <- sample(colors(), n_sg)

  temp <- plotKM.band_subgroups(
    df=df_gbsg,
    sg_labels = sg_labs,
    sg_colors = sg_cols,
    xlabel="Months", ylabel="Survival difference",
    yseq_length=5, cex_Yaxis=0.7, risk_cex=0.7,
    tau_add=42, by.risk=6, risk_delta=0.05, risk.pad=0.03, draws.band = 5000,
    tte.name = "time_months", treat.name = "hormon", event.name = "status" 
  )
  legend("topleft", c("ITT", sg_labs),
         lwd=2, col=c("black", sg_cols), bty="n", cex=0.75)
  title(main="ITT and subgroups")

Look at er>0 subgroup population

—- Simultaneous CI bands —-

par(mfrow=c(1,2))
  temp <- plotKM.band_subgroups(
    df = subset(df_gbsg, er > 0),
    xlabel="Months", ylabel="Survival difference",
    yseq_length=5, cex_Yaxis=0.7, risk_cex=0.7,
    tau_add=42, by.risk=6, risk_delta=0.075, risk.pad=0.03,
    tte.name = "time_months", treat.name = "hormon", event.name = "status", draws.band = 5000, qtau = 0.025, show_resamples = TRUE
  )
  legend("topleft", c("Difference estimate", "95% pointwise CIs"),
         lwd=c(2,1), col=c("black"), lty=c(1,2), bty="n", cex=0.75)
  title(main="Estrogen receptor positive (er>0) sub-population")

Rotterdam data example with propensity score weighting

—- Simultaneous CI bands —-

par(mfrow=c(1,2))
  temp <- plotKM.band_subgroups(
    df=df_rotterdam,
    xlabel="Months", ylabel="Survival difference",
    yseq_length=5, cex_Yaxis=0.7, risk_cex=0.7,
    tau_add=42, by.risk = 12, risk_delta=0.075, risk.pad=0.03,
    tte.name = "time_months", treat.name = "hormon", event.name = "status", weight.name = "sw.weights", 
    draws.band = 5000, qtau = 0.025, show_resamples = TRUE
  )
   legend("topleft", c("Difference estimate", "95% pointwise CIs"),
   lwd=c(2,1), col=c("black"), lty=c(1,2), bty="n", cex=0.75)
   title(main="ITT population weighted")

Some checks on results

Check weighted K-M SE’s with survfit

dfcount_rotterdam_wtd <- get_dfcounting(df=df_rotterdam, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms, by.risk=24, 
                                  weight.name="sw.weights", check.seKM = TRUE, draws = 0)

Check SE’s calculated via resampling (draws = 5000)

Note that draws = 5000 does not take much time, but can be reduced to around 500 for sufficient accuracy.

dfcount_rotterdam_wtd <- get_dfcounting(df=df_rotterdam, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms, by.risk=24, 
                                  weight.name="sw.weights", check.seKM = TRUE, draws = 5000)

Compare with the adjustedCurves package

# Compare with adjustedCurves
library(adjustedCurves)
library(pammtools)
## 
## Attaching package: 'pammtools'
## The following object is masked from 'package:stats':
## 
##     filter
df_rotterdam$hormon2 <- with(df_rotterdam, factor(hormon, labels=c("No","Yes")))

ps_model <- glm(hormon2 ~ age+meno+size+grade+nodes+pgr+chemo+er, data = df_rotterdam, family="binomial")

iptw <- adjustedsurv(data=df_rotterdam, variable="hormon2", ev_time="time_months", event = "status", method = "iptw_km", treatment_model = ps_model, conf_int = TRUE)
plot_weighted_km(dfcount=dfcount_rotterdam_wtd, conf.int = TRUE)
 title(main="Rotterdam weighted K-M curves")

 plot(iptw, conf_int = TRUE, legend.title = "Hormonal therapy")

Compare SE’s with adjustedCurves package

# Check SEs
par(mfrow=c(1,2))

df_mine <- dfcount_rotterdam_wtd

df_check <- subset(iptw$adj, group == "No")
yymax <- max(c(sqrt(df_mine$sig2_surv0[df_mine$idv0.check]), df_check$se))
toget <- df_mine$idv0.check
tt <- df_mine$at.points[toget]
yy <- sqrt(df_mine$sig2_surv0)[toget]
plot(tt,yy, type="s", lty=1, col="lightgrey", lwd=4, ylim=c(0,yymax), xlab="time", ylab="SEs")
with(df_check, lines(time, se, type="s", lty=2, lwd=1, col="red"))
title(main="Control SEs")


df_check <- subset(iptw$adj, group == "Yes")
yymax <- max(c(sqrt(df_mine$sig2_surv1[df_mine$idv1.check]), df_check$se))
toget <- df_mine$idv1.check
tt <- df_mine$at.points[toget]
yy <- sqrt(df_mine$sig2_surv1)[toget]
plot(tt,yy, type="s", lty=1, col="lightgrey", lwd=4, ylim=c(0,yymax), xlab="time", ylab="SEs")
with(df_check, lines(time, se, type="s", lty=2, lwd=1, col="red"))
title(main = "Treat SEs")

Check log-rank statistics

# dfcount_gbsg contains logrank (chisq) stats from specified scheme (default is standard logrank rho=0 and gamma=0) from 3 sources:
# (1) wlr_estimates function 
# (2) survdiff which only provides rho options so gamma !=0 is not available
# (3) z_score_calculations function (from a Cox score-test perspective)
check_results(dfcount_gbsg)
## zlr_sq=8.564781, logrank=8.564781, zCox_sq=8.564781
# Check wlr_dhat_estimates which calculates from any scheme based on counting dataset 
temp <- wlr_dhat_estimates(dfcounting = dfcount_gbsg, rho = 0, gamma = 0, scheme = "fh")
with(temp,lr^2/sig2_lr)
## [1] 8.564781
  1. Stratified by grade, just for illustration
res <- df_counting(df=df_gbsg, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms, strata.name="grade")

stratified

with(res,lr_stratified^2/sig2_lr_stratified)
## [1] 7.395797
# survdiff stratified
res$logrank_results$chisq
## [1] 7.395797
 # Cox score stratified
with(res,z.score_stratified^2)
## [1] 7.395797

non-stratified (should be same as above results without stratification)

with(res,lr^2/sig2_lr)
## [1] 8.564781
with(res,z.score^2)
## [1] 8.564781

Add duplicate subject (pid==51) for testing how plots mark ties

df_add <- subset(df_gbsg, pid == 51)
df_add$status <- 1.0
df_mod <- rbind(df_gbsg, df_add)
dfcount_mod <- get_dfcounting(df=df_mod, tte.name=tte.name, event.name=event.name, treat.name=treat.name, arms=arms)
check_results(dfcount_mod)
## zlr_sq=8.551190, logrank=8.551190, zCox_sq=8.551190

For modified data

par(mfrow=c(1,2))
plot_weighted_km(dfcount_mod)
plot_km(df=df_mod, tte.name=tte.name, event.name=event.name, treat.name=treat.name)

Appendix: Survival analysis helper and main analysis functions

A set of functions for advanced Kaplan-Meier survival analysis, including estimation, event/risk matrix construction, resampling for confidence bands, and group comparison.

df_counting

Prepares a comprehensive dataset for survival analysis using the counting process approach. Computes risk sets, event counts, Kaplan-Meier estimates, log-rank and Cox model results, and quantile estimates for two groups (e.g., treatment and control), optionally handling weights and stratification.

Key Features: - Calculates risk set sizes and event counts at specified time points. - Computes weighted and unweighted Kaplan-Meier survival estimates for each group. - Provides log-rank and Cox model test results (including hazard ratios and p-values). - Estimates quantiles (e.g., median survival) and their confidence intervals. - Supports stratification and weighted analyses. - Performs internal checks for consistency and validity of KM curves and quantile calculations.

Typical Inputs: - df: Data frame containing survival data. - tte.name: Name of the time-to-event column. - event.name: Name of the event indicator column (0/1). - treat.name: Name of the treatment group column (0/1). - weight.name: (Optional) Name of the weights column. - strata.name: (Optional) Name of the stratification column. - arms: Character vector of group labels. - Additional parameters for risk table intervals, quantile estimation, and statistical tests.

Typical Outputs: - A list containing: - Risk set sizes and event counts for each group and time point. - Kaplan-Meier survival estimates and variances for each group. - Log-rank and Cox model results (test statistics, p-values, hazard ratios). - Quantile estimates (e.g., median survival) and confidence intervals. - Indices and times for events and censoring. - (If stratified) Results for each stratum.

Main Steps: 1. Validates and prepares input data. 2. Computes risk sets and event counts for each group and time point. 3. Calculates KM survival estimates and variances. 4. Performs log-rank and Cox model tests. 5. Estimates quantiles and confidence intervals. 6. Checks for consistency in KM curves and quantile calculations. 7. Returns a comprehensive list of results for downstream analysis and plotting.

Kaplan-Meier plots

Creates Kaplan-Meier survival plots for two groups (e.g., treatment vs. control), allowing for the use of weights (such as inverse probability weights) in the estimation. The function can display survival curves, confidence intervals, risk tables, and optionally subgroup curves or additional annotations.

plot_weighted_km

This function calls KM_plot_2sample_weighted_counting described below.

Key Features: - Plots weighted Kaplan-Meier survival curves for two groups. - Supports display of confidence intervals and risk tables. - Can handle weighted data (e.g., for marginal or adjusted survival). - Allows customization of plot appearance (colors, labels, axis limits, etc.). - May support subgroup overlays or additional plot annotations (depending on implementation).

Typical Inputs: - df: Data frame containing survival data. - tte.name: Name of the time-to-event column. - event.name: Name of the event indicator column (0/1). - treat.name: Name of the treatment group column (0/1). - weight.name: (Optional) Name of the weights column. - Additional graphical and control parameters (e.g., colors, axis labels, risk table options).

Typical Outputs: - A Kaplan-Meier plot with survival curves for each group, possibly with confidence intervals and risk tables. - (Optionally) Returns an object with survival estimates, time points, and other relevant results.

Main Steps: 1. Validates and prepares input data. 2. Computes weighted Kaplan-Meier survival estimates for each group. 3. Plots survival curves, confidence intervals, and risk tables. 4. Adds optional annotations or subgroup curves if specified.

Summary of KM_plot_2sample_weighted_counting Function

The KM_plot_2sample_weighted_counting function plots Kaplan-Meier survival curves for two groups (e.g., treatment and control) using weighted counting data, with options for displaying risk tables, legends, median survival, and statistical results (Cox model, log-rank test).

Key Features: - Input validation for required data fields - Extraction of survival, censoring, and risk set data for both groups - Optional checks for valid KM curves - Plotting of KM curves with customizable appearance - Optional risk table below the plot - Legends for arms, Cox model, and log-rank test - Optional annotation of median survival - Extensive customization options

Parameters:

xmed.fraction Parameter

Purpose: Controls the horizontal position of the median survival annotation (or other quantile annotation) on the Kaplan-Meier plot.

How it works: - Numeric value between 0 and 1. - Specifies the fraction of the x-axis (time axis) at which to place the median annotation text. - xmed.fraction = 0 places the annotation at the far left of the plot. - xmed.fraction = 1 places it at the far right. - Typical value like xmed.fraction = 0.80 places the annotation at 80% of the way from left to right.

Example: If your time axis goes from 0 to 24 months and xmed.fraction = 0.80, the annotation will be placed at 19.2 months on the x-axis.

Usage in the function: Allows fine-tuning of the median (or quantile) label position to avoid overlap with curves or other plot elements.

  • dfcount: List with all necessary data
  • show.cox, show.logrank: Show Cox/log-rank results
  • show_arm_legend: Show arm legend
  • arms: Names of the two groups
  • put.legend.arms, put.legend.cox, put.legend.lr: Legend positions
  • col.0, col.1: Colors for the two groups
  • ltys, lwds: Line types and widths
  • show.med: Show median survival annotation
  • …and many more for fine-tuning

Workflow: 1. Validate input 2. Extract data for both groups 3. Check KM curves (optional) 4. Plot KM curves 5. Add risk table (optional) 6. Add legends (arms, Cox, log-rank) 7. Annotate median survival (optional)

In summary, this function is a comprehensive tool for visualizing and annotating weighted Kaplan-Meier survival curves for two groups, with flexible options for statistical annotation and plot customization.

Kaplan-Meier difference analyses

Compares Kaplan-Meier survival curves between two groups (e.g., treatment vs. control) using time-to-event data. Calculates survival estimates, their differences, and provides both pointwise and simultaneous confidence intervals. Optionally, performs resampling for simultaneous confidence bands.

KM_estimates

Usage:

KM_estimates(ybar, nbar, sig2w_multiplier = NULL)

Arguments:
- ybar: Vector of risk set sizes at each time point. - nbar: Vector of event counts at each time point. - sig2w_multiplier: Optional vector for variance calculation.

Value:
A list with survival estimates (S_KM) and their variances (sig2_KM).

get_event_risk_matrices

Description:
Constructs matrices indicating event and risk status for each subject at specified time points.

Usage:

get_event_risk_matrices(U, at.points)

Arguments:
- U: Vector of observed times. - at.points: Vector of time points for evaluation.

Value:
A list with event and risk matrices.

resampling_survival

Description:
Performs resampling to generate survival curves for a group, used for constructing confidence bands.

Usage:

resampling_survival(U, W, D, at.points, draws.band, surv, G_draws)

Arguments:
- U: Vector of observed times. - W: Vector of weights. - D: Vector of event indicators. - at.points: Time points for evaluation. - draws.band: Number of resampling draws. - surv: Survival estimates. - G_draws: Matrix of random draws.

Value:
Matrix of resampled survival curves.

KM_diff

Description:
Compares Kaplan-Meier survival curves between two groups (e.g., treatment vs. control) using time-to-event data. Calculates survival estimates, their differences, and provides both pointwise and simultaneous confidence intervals. Optionally, performs resampling for simultaneous confidence bands.

Usage:

KM_diff(
  df, tte.name, event.name, treat.name, weight.name = NULL,
  at.points = sort(df[[tte.name]]), alpha = 0.05, seedstart = 8316951,
  draws = 0, risk.points, draws.band = 0, tau.seq = 0.25, qtau = 0.025,
  show_resamples = TRUE
)

Arguments:
- See above for full list.

Value:
A list containing survival estimates, variances, differences, pointwise and simultaneous confidence intervals, and (optionally) resampled curves.

Workflow Integration:
- Use get_event_risk_matrices to construct event/risk matrices for your data. - Use KM_estimates to compute survival estimates and variances. - Use resampling_survival for confidence band resampling. - Use KM_diff to compare groups and summarize results.

Example Workflow:

# Prepare data
event_risk <- get_event_risk_matrices(U = mydata$time, at.points = seq(0, max(mydata$time), by = 1))
km_results <- KM_estimates(ybar = event_risk$risk_mat, nbar = event_risk$event_mat)
resampled <- resampling_survival(U = mydata$time, W = rep(1, nrow(mydata)), D = mydata$status,
                                 at.points = seq(0, max(mydata$time), by = 1), draws.band = 1000,
                                 surv = km_results$S_KM, G_draws = matrix(rnorm(1000 * nrow(mydata)), ncol = 1000))
km_diff_results <- KM_diff(df = mydata, tte.name = "time", event.name = "status", treat.name = "group",
                           risk.points = seq(0, max(mydata$time), by = 1), draws.band = 1000)

plotKM.band_subgroups

Description:
Plots the difference in Kaplan-Meier survival curves between two groups (e.g., treatment vs. control), optionally including simultaneous confidence bands and subgroup curves. The function also displays risk tables for the overall population and specified subgroups.

Usage:

plotKM.band_subgroups(
  df, tte.name, event.name, treat.name, weight.name = NULL,
  sg_labels = NULL, ltype = "s", lty = 1, draws = 20, lwd = 2,
  sg_colors = NULL, color = "lightgrey", ymax.pad = 0.01, ymin.pad = -0.01,
  taus = c(-Inf, Inf), yseq_length = 5, cex_Yaxis = 0.8, risk_cex = 0.8,
  by.risk = 6, risk.add = NULL, xmax = NULL, ymin = NULL, ymax = NULL, ymin.del = 0.035,
  y.risk1 = NULL, y.risk2 = NULL, ymin2 = NULL, risk_offset = NULL, risk.pad = 0.01,
  risk_delta = 0.0275, tau_add = NULL, time.zero.pad = 0, time.zero.label = 0.0,
  xlabel = NULL, ylabel = NULL, Maxtau = NULL, seedstart = 8316951,
  ylim = NULL, draws.band = 20, qtau = 0.025, show_resamples = FALSE
)

Arguments:
- df: Data frame containing survival data. - tte.name, event.name, treat.name, weight.name: Column names for time-to-event, event indicator, treatment group, and weights. - sg_labels: Character vector of subgroup definitions (as logical expressions). - sg_colors: Colors for subgroup curves. - ltype, lty, lwd: Line type, style, and width for curves. - draws, draws.band: Number of draws for resampling and simultaneous bands. - color: Color for confidence band polygon. - ymax.pad, ymin.pad, ylim, ymin, ymax: Plot y-axis padding and limits. - risk_cex, cex_Yaxis: Text size for risk table and axis. - by.risk, risk.add, risk_delta, risk_offset, risk.pad: Risk table settings. - xlabel, ylabel: Axis labels. - Maxtau, tau_add, taus: Time truncation settings. - show_resamples: Logical; whether to plot resampled curves. - …and other graphical parameters.

Value:
(Invisible) list containing: - fit_itt: KM_diff results for the full population. - xpoints: Time points used. - Dhat_subgroups: Difference curves for subgroups. - s0_subgroups, s1_subgroups: Survival curves for control and treatment in subgroups. - rpoints: Risk table time points. - Risk_subgroups: Risk table counts for subgroups. - mean, lower, upper: Main difference curve and confidence intervals.

Details:
- Uses KM_diff to compute survival differences and confidence intervals. - Plots the main difference curve, confidence bands, and subgroup curves. - Displays risk tables for the overall population and subgroups. - Handles simultaneous confidence bands if draws.band > 0.

Example:

plotKM.band_subgroups(
  df = mydata,
  tte.name = "time",
  event.name = "status",
  treat.name = "group",
  weight.name = "weights",
  sg_labels = c("age > 65", "sex == 'F'"),
  sg_colors = c("red", "blue"),
  draws.band = 1000
)

wlr_dhat_estimates

Key Features: - Calculates the weighted log-rank statistic and its variance. - Computes the difference in survival at a user-specified time (tzero). - Estimates the variance of the survival difference. - Computes the covariance and correlation between the log-rank statistic and the survival difference. - Supports multiple weighting schemes (Fleming-Harrington, Schemper, Xu & O’Quigley, Maggir-Burman, custom time-based, and custom codes) via the scheme argument.

Typical Inputs: - dfcounting: List from df_counting containing risk sets, event counts, and survival estimates. - rho, gamma: Weighting parameters (for Fleming-Harrington and custom schemes). - tzero: Time point for evaluating the survival difference. - scheme: Weighting scheme (e.g., “fh”, “schemper”, “XO”, “MB”, “custom_time”, “custom_code”). - Additional arguments for specific schemes (e.g., Scensor, Ybar, tpoints, t.tau, w0.tau, w1.tau, mb_tstar).

Typical Outputs: - A list containing: - lr: Weighted log-rank test statistic. - sig2_lr: Variance of the log-rank statistic. - dhat: Difference in survival at tzero. - sig2_dhat: Variance of the survival difference. - cov_wlr_dhat: Covariance between log-rank and survival difference. - cor_wlr_dhat: Correlation between log-rank and survival difference.

Main Steps: 1. Extracts risk sets, event counts, and survival estimates from dfcounting. 2. Computes time-dependent weights using wt.rg.S according to the selected scheme. 3. Calculates the log-rank statistic, survival difference at tzero, their variances, covariance, and correlation. 4. Returns all results in a list for further analysis or reporting.